FEAT: extend profiling to child processes by TTsangSC · Pull Request #431 · pyutils/line_profiler

TTsangSC · 2026-04-14T16:13:41Z

This PR adds support for kernprof to profile code execution in child Python processes, building on ongoing work (see Credits).

Usage

The EXPERIMENTAL new flags --no-prof-child-procs and --prof-child-procs[=...] are added to kernprof. By setting --prof-child-procs to true, child Python processes created by the profiled process are also profiled:¹

$ kernprof -lv --prof-child-procs -c "if True:
    import itertools
    import multiprocessing
    from collections.abc import Collection

    def sum_worker(nums: Collection[int]) -> int:
        result = 0
        for x in nums:
            result += x
        return result

    def sum_parallel(nums: Collection[int], nprocs: int) -> int:
        size_ = len(nums) / nprocs
        size = int(size_)
        if size_ > size:
            size += 1
        with multiprocessing.Pool(nprocs) as pool:
            sub_sums = pool.map(sum_worker, itertools.batched(nums, size))  # 3.12+
            pool.close()
            pool.join()
        return sum_worker(sub_sums)

    if __name__ == '__main__':
        print(sum_parallel(range(1, 1001), 3))"
500500
Wrote profile results to 'kernprof-command-<...>.lprof'
Timer unit: 1e-06 s

Total time: 0.000312 s
File: <...>/kernprof-command.py
Function: sum_worker at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     6                                               def sum_worker(nums: Collection[int]) -> int:
     7         4          3.0      0.8      1.0          result = 0
     8      1007        155.0      0.2     49.7          for x in nums:
     9      1003        153.0      0.2     49.0              result += x
    10         4          1.0      0.2      0.3          return result

Total time: 0.100223 s
File: <...>/kernprof-command.py
Function: sum_parallel at line 12

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    12                                               def sum_parallel(nums: Collection[int], nprocs: int) -> int:
    13         1          1.0      1.0      0.0          size_ = len(nums) / nprocs
    14         1          1.0      1.0      0.0          size = int(size_)
    15         1          0.0      0.0      0.0          if size_ > size:
    16         1          0.0      0.0      0.0              size += 1
    17         2      21685.0  10842.5     21.6          with multiprocessing.Pool(nprocs) as pool:
    18         1      68692.0  68692.0     68.5              sub_sums = pool.map(sum_worker, itertools.batched(nums, size))  # 3.12+
    19         1         27.0     27.0      0.0              pool.close()
    20         1       9800.0   9800.0      9.8              pool.join()
    21         1         17.0     17.0      0.0          return sum_worker(sub_sums)

Note how the sum_worker() calls are profiled:

The main process contributes 1 call and 3 loops summing the sub-sums.
The 3 child processes each contributes 1 call, and they loop over all 1000 of the items combined.

Highlights

Children created by (including but not limited to) these methods can be profiled:
- os.system() and subprocess.run()
- multiprocessing²
All three multiprocessing "start methods" ('fork', 'forkserver', and 'spawn') tested to be compatible, where available on the platform
Profiling unaffected by whether the profiled function run in child processes:
- Is locally defined in the profiled code or imported
- Executes cleanly or errors out
Mode of profiling (with eager --preimports or via test-code rewriting) replicated in child processes

Explanation

A serializable cache object (line_profiler._child_process_profiling.cache.LineProfilingCache) is created by the main process, containing session config information (e.g. values for --prod-mod and --preimports) so that profiling can be replicated in child processes.
In the main process, environment variables are injected, so that it and its children would have access to its PID and the cache-directory location.
A temporary .pth file is created; Python processes inheriting the right environment will thus go through profiling setup, while those without the env var (and just happens to share the Python executable) will be minimally affected.
os.fork() (where available) is patched with a wrapper which ensures consistent global states.
As with coverage.multiproc, various multiprocessing components are patched (line_profiler._child_process_profiling.multiprocessing_patches.apply()) so that child processes can retrieve the cache and explicitly cleanup before exiting. Patches are inherited by forked child processes and reapplied by spawned ones.
To make sure that multiprocessing child processes are allowed to fully clean up and write their profiling, even when the parallel workload errors out,³ additional patches are made to multiprocessing.
When properly set up, child processes write profiling output on exit to the session cache directory, which kernprof then gather and merge with the profiling result in main process.

Code changes

New code (click to expand)

`line_profiler/_threading_patches.py`

New submodule patching threading for the consistent gathering of profiling data between tracing modes.

apply():
When legacy tracing is used (Python < 3.12 or LINE_PROFILER_CORE=ctrace), patch threading.Thread.__init__() so that the profiler starts with the same .enable_count on the new thread as with the parent thread; this is necessary for correctness because the profiler may otherwise not be enabled on the new thread.

`line_profiler/cleanup.py`

New submodule defining the Cleanup class, which handles various setup/cleanup tasks like:

Registering/Calling callbacks
Creation/Deletion of tempfiles
Insertion/Reversion of environment variables
Patching/Restoration of object attributes

`line_profiler/curated_profiling.py`

New submodule containing mostly relocated code from kernprof, so that child processes can more easily reestablish profiling:

ClassifiedPreimportTargets:
Object resolving and classifying the --prof-mods, and writing a corresponding preimport module
CuratedProfilerContext:
Context manager managing the state of the LineProfiler, e.g.:
- Slipping it into line_profiler.profile on startup
- Patching threading (see _threading_patches above) so that the profiler stays enabled on newly spawned threads
- Purging its .enable_counts on teardown

`line_profiler/_child_process_profiling/`

New private subpackage for maintaining the states, setting up the hooks, and performing the patches which makes it possible to profile child processes:

cache.py::LineProfilingCache:
"Session state" object. It:
- Can be auto-(de-)serialized in the main and child processes based on env-var values, managing setup (module patches, profiler curation, eager pre-imports) and cleanup (tempfile management, dumping and gathering of profiling results) in each process.
- Injects the following environment variables, which are inherited by child processes:
  - ${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_PID}: main-process PID
  - ${LINE_PROFILER_PROFILE_CHILD_PROCESSES_CACHE_DIR_<PID>}: location of the cache directory
  From the combination of both, child processes can retrieve the cache by calling .load().
multiprocessing_patches.py::apply():
Apply patches to these multiprocessing module components so that profiling results are properly gathered on child-process exit:
- Process (read: multiprocessing.process.BaseProcess):
  - ._bootstrap():
    Wrapped call to:
    - Call LineProfilingCache.cleanup() on exit.
    - Where possible, register a SIGTERM handler to do the above as a failsafe.
  - .terminate():
    Wrapped call to poll on the child process, and soft-block (with a timeout) until it is deleted.
- spawn.runpy:
  Replaced with a localized, patched clone of runpy (see runpy_patches.py below). This is necessary for profiling to function in non-eager-preimports mode (--no-preimports).
- forkserver.ForkServer:
  Global instance rebooted if reboot_forkserver=True so that child processes fork with the profiling infrastructure set up properly.
pth_hooks.py:
Facilities for effecting profiling-code execution in child processes by injecting a temporary .pth file into the current venv. This module is kept as minimal as possible to minimize the amount of startup code run as the mere result of having said .pth file.
- write_pth_hook():
  Set up a .pth file under the directory sysconfig.get_path('purelib') which calls load_pth_hook() (see below). The .pth file will be cleaned up by the supplied cache object.
- load_pth_hook():
  For processes inheriting a matching "parent PID" from the environment (see LineProfilingCache above), load the cache and set up the LineProfiler instance used, like how the main kernprof process does.
runpy_patches.py::create_runpy_wrapper():
Make a clone of the runpy module which checks if the code executed is the code to be profiled; if so, it goes through the same code-rewriting facilities that line_profiler.autoprofile.autoprofile.run() uses to set up profiling.
threading_patches.py::apply():
Patch threading.Thread.__init__() so that when using legacy tracing the profiler is suitably enabled on newly-spawn thread, depending on whether it is enabled on the current thread.

`tests/conftest.py`

Add the @pytest.mark.retry marker so that we can retry flaky tests.

_RetryEntry:
Object containing info on retried test items so that they can be summarized when the session ends.
_RetryHelper:
Object implementing the retrying of failing test functions (pytest.Function) and the teardown of function-scoped fixtures between retries.
- pytest_pyfunc_call():
  Hook implementation for instantiating a helper to handle test rerunning if necessary.
- get_helper():
  Instantiate from a pytest.Function marked with @pytest.mark.retry(...).
pytest_addhooks():
Hook implementation for registering _RetryHelper as a plugin.
pytest_configure():
Hook implementation for defining the @pytest.mark.retry marker, so that it doesn't produce a pytest.PytestUnknownMarkWarning when used.
pytest_terminal_summary():
Hook implementation for writing a summary section if any test is retried.

`tests/test_retry_tests.py`

Test that the @pytest.mark.retry marker:

Properly handles fixture scoping (test_fixture_scoping()), esp. the teardown on non-persisted function-scoped fixtures between retries (test_fixture_teardown())
Allows for constraining the "allowed" error classes where retries are attempted (test_exception_restrictions())
Writes a summary of the retried tests at the end of the session.
Does not produce a pytest.PytestUnknownMarkWarning when used.

`tests/test_child_procs.py`

_ModuleFixture:
Helper object which handles:
- Module-name mangling (à la tests/test_cython.py::propose_name()) to avoid clashes; and
- Installation into and cleanup from sys.path and os.environ['PYTHONPATH'].
_Params:
Helper object which handles concatenation and Cartesian products of parametrizations.
ext_module:
New _ModuleFixture representing a module defining the sum function used by test_module when run without the --local flag.
_run_subproc():
New wrapper around subprocess.run() which provide extra debugging output (standard streams, timing info, etc.)
"Unit tests" for the line_profiler._child_processing_profiling components, or as close as is possible thereto:
- test_runpy_patches():
  Test the functionality of ~.runpy_patches.create_runpy_wrapper().
- test_cache_dump_load():
  Test the functionalities of ~.cache.LineProfilingCache.dump() and .load().
- test_cache_setup_main_process():
  Test the functionality of ~.cache.LineProfilingCache._setup_in_main_process().
- test_cache_setup_child():
  Test the functionality of ~.cache.LineProfilingCache._setup_in_child_process().
- test_load_pth_hook():
  Test the functionality of ~.pth_hook.load_pth_hook().
- test_apply_mp_patches():
  Test the functionality of ~.multiprocessing_patches.apply().
test_profiling_multiproc_script():
"Main" new test for running the test_module (see Modified Code) with kernprof --prof-child-procs; heavily parametrized to check for profiling-result correctness in different contexts:
- run_func: execution modes (kernprof <script>, kernprof -m <module>, and kernprof -c <code>)
- prof_child_procs; whether to use child-process profiling (--[no-]prof-child-procs)
- preimports: eager vs. on-import profiling (--[no-]preimports)
- use_local_func: whether the parallel workload is locally defined in the executed code or imported from external modules
- fail: whether the parallel workload errors out
- start_method: multiprocessing "start methods" ('fork', 'forkserver', and 'spawn')
test_profiling_bare_python():
New test for profiling child processes where the code run by kernprof --prof-child-procs doesn't directly invoke multiprocessing, but spins up another Python process that does (via os.system() or subprocess.run()).

Modified code (click to expand)

`line_profiler/line_profiler.py::LineStats`

.get_empty_instance():
New convenience class method for creating an instance with no profiling data and the platform-appropriate .unit.
.from_files():
Added new argument on_defective: Literal['ignore', 'warn', 'error'], allowing for passing over bad files (e.g. empty ones) with optional warnings. The old behavior (on_defective='error') remains the default.

`line_profiler/rc/line_profiler.toml`

[tool.line_profiler.kernprof]:
New key-value pair prof-child-procs for the default of the kernprof --[no-]prof-child-procs flag.
[tool.line_profiler.multiprocessing]:
New key-value pairs for controlling the application of the .multiprocessing_patches:
- intercept_logs (bool):
  Whether to patch methods like multiprocessing.util.debug() to also capture the multiprocessing logs.
- polling:
  Behavior for the polling for child-process termination:
  - cooldown (float):
    Time (s) to wait between polls.
  - timeout (float):
    Timeout (s) for the overall polling.
  - on_timeout (Literal['error', 'warn', 'ignore']):
    What to do when the polling timed out.
The multiprocssing table and its contents are as of yet considered private implementation details.

`kernprof.py`

_add_core_parser_arguments():
Now adding the new --[no-]prof-child-procs flags to the parser.
_write_preimports():
Refactored to use the new/relocated facilities at line_profiler.curated_profiling.
_dump_filtered_stats():
- New argument extra_line_stats: LineStats | None allows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).
- Partially split off into the new _dump_filtered_line_stats() which it now calls.
_manage_profiler:
Context manager refactored from the old _pre_profile() for more Pythonic handling of setups and teardowns.
- Added setup for the session cache via calling _prepare_child_profiling_cache().
- The old function body is split off into smaller components (_prepare_profiler(), _prepare_exec_script()).
- Now calling _post_profile() on context exit so that we no longer have to explicitly try: ... finally: ... in _main_profile().
_post_profile():
- New argument extra_line_stats: LineStats | None allows for handling and combining the profiling stats gathered elsewhere (e.g. child processes).
- Simplified because some of the cleanup is relocated to line_profiler.curated_profiling.

`tests/test_child_procs.py`

test_module:
- Refactored from a Path fixture into a _ModuleFixture (see above in New Code).
- Added the following command-line flags to the code:
  - --start-method selects a specific multiprocessing "start method".
  - --local toggles between using a sum function defined locally in test_module or the one defined externally in ext_module (see New Code).
  - --force-failure toggles whether the sum function should return normally or raise an error.
_run_as_{script,module}():
- Now joined by a _run_as_literal_code() to also test kernprof -c ....
- Now taking test_module as a _ModuleFixture instead of a path, and handling its installation.
_run_test_module():
- New convenience wrappers run_module = partial(_run_test_module, _run_as_module), etc. now available for more convenient testing of kernprof execution modes as test parametrization.
- New arguments:
  - profiled_code_is_tempfile: bool helps with constructing the kernprof command line in cases where the code is anonymous (kernprof -c ...).
  - use_local_func: bool, fail: bool, and start_method: Literal['fork', 'forkserver', 'spawn'] | None allows for fuzzing code execution with the aforementioned test_module CLI flags (resp. --local, --force-failure, and --start-method).
  - nhits: dict[str, int] | None, when provided, checks that the line-hit stats are as expected (all calls traced with --prof-child-procs, only those in the main process without).
- Added checks:
  - If fail is true, the kernprof subprocess should fail.
  - Temporary .pth files created by kernprof --prof-child-procs should be cleaned up.
  - Profiling output is consistent with the provided nhits (where available).
test_multiproc_script_sanity_check():
- Now fuzzing the parametrizations use_local_func, fail, and start_method, to ensure that the test script is fully functional in vanilla Python.
- Superseded the argument as_module: bool with run_func: Callable[..., CompletedProcess], allowing for more flexible testing of execution modes (python ..., python -m ..., and the new python -c added via the aforementioned _run_as_literal_code()).
test_running_multiproc_script():
New parametrization run_func allows for absorbing the old test_running_multiproc_module() into the same test as additional parametrization, as well as testing kernprof -c.

Caveats

The temporary .pth file created is course benign and as mentioned tries to be as out of the way as possible, but I just figured that the use of .pth files should be called out, given their recent spotlight in a CVE vulnerability.
Since the .pth file is written to sys.get_path('purelib'), it depends on said directory being writable. If we aren't in a venv or a similarly isolated environment (which is increasingly unlikely nowadays), all processes using the system Python will have to import and run line_profiler._child_process_profiling.load_pth_hook(). Though the function itself should quit rather quickly when we're not in a child process, it may still entail loading a significant portion of line_profiler into sys.modules.
Blocking Process.terminate()³ is a bit dodgy:
- To avoid uncontrolled hot-looping .poll()-ing for the activity of the child-process PID, I'm just using a 1/32-s cooldown.
- After all, there's a reason Process.terminate() just SIGTERMs the child process with reckless regard – children which errored out are sporadically stuck in an unclean state, and the polling may enter a deadlock. To guard against that I added:
  - A 0.25-s timeout for the polling, after which we just issue a warning and the SIGTERM is sent anyway
  - For tests running kernprof --prof-child-procs or equivalent code:
    - A 5-s timeout for subprocess.run().
    - The opportunity to retry twice if the test failed by insufficient profiling data (because the child process did clean up properly) or if the polling timed out and errored out; the latter doesn't currently happen but can be configured to
  This seems to be enough to both get rid of the deadlocks in tests and preserve profiling data... but the problem is that for child processes to deadlock AT ALL, their cleanup routines must have (of yet) failed to complete, and thus there is still a risk of profiling data not being written. So there's probably either some race conditions hidden by the delays, or an error in how the PID statuses are detected.
(Note however that on non-Windows platforms we can and do just setup a SIGTERM handler,⁴ which ensures that cleanup finishes before the child succumbs to termination. But yeah on Windows it's just the polling and the delays keeping us afloat.)
Apparently coverage gets by alright by only patching Process._bootstrap(), without the above termination issue. Gotta figure out why...

TODO

Add documentation on this new feature.
Maybe we should indicate this feature to be experimental...
Would it make more sense for any of the content in line_profiler._child_process_profiling to become public API?

Credits

Loosely based on similar work I did for pytest-autoprofile, which in turn was based on the solution implemented in coverage.multiproc.
Related PRs:

Notes

Welp. This took way longer than I expected. The main friction points were that:

There isn't a pre-existing "global-ish" state object that I can leverage, and which can be easily replicated in subprocesses. The new line_profiler._child_process_profiling.cache.LineProfilingCache class tackles this issue.
I had a very hard time trying to make profiling results consistent even when the parallelly-executed function errors out. Would have thought that I already took care of that in the other project (see pytest-autoprofile::tests/test_subprocess.py::_test_inner()), but apparently I only made the tests fail there, not the parallel functions themselves. Had to do some rather hacky stuff to circumvent that (see Caveats)...³

Note however that the equivalent vanilla Python command (python -c ...) would error out, because functions sent to multiprocessing must be pickle-able and thus reside in a physical file. This is sidestepped by kernprof's always writing code received by kernprof -c ... and ... | kernprof - to a tempfile (ENH: auto-profile stdin or literal snippets #338). ↩
In the test suite we're only testing process creation with the most common multiprocessing[.get_context(...)].Pool. However, since none of the patched components are specific to multiprocessing.pool, it should also work with other model of parallelism built with the components of multiprocessing. ↩
From the docs for mulitprocessing.Process.terminate(): Note that exit handlers and finally clauses, etc., will not be executed. Normally this doesn't matter, but if the parallelly-executed function errors out, multiprocessing has a bad habit of just .terminate()-ing child processes without allowing for enough time to run cleanup, leading to incomplete profiling data. Hence the only workaround seems to be intercepting Process.terminate() calls and blocking them where appropriate. ↩ ↩² ↩³
Handling SIGTERM on Windows is generally noted to be inconsistent. See e.g. this coverage.control.Coverage._init_for_start() comment and this SO discussion that it refers to. ↩

TTsangSC · 2026-04-15T03:46:59Z

Did some more tests on local post-#428-merge, maybe it is just legacy Python and dependency versions causing the issues. Will just rebase, force-push, and see what happens.

- `line_profiler/curated_profiling.py` New module for setting up profiling in a curated environment - `ClassifiedPreimportTargets.from_targets()` Method for creating a `ClassifiedPreimportTargets` instance, facilitating writing pre-import modules in a replicable and portable manner - `ClassifiedPreimportTargets.write_preimport_module()` Method for writing a pre-import module based on an instance; also fixed bug where the body of the written module was intercepted without appearing in the debug output - `kernprof.py` - `_gather_preimport_targets()` Migrated to `line_profiler.curated_profiling` - `_write_preimports()` Now using the new `ClassifiedPreimportTargets` class, moving esp. the logic to the `write_preimport_module()` method

- `kernprof.py::_manage_profiler` `line_profiler/curated_profiling.py::CuratedProfilerContext` New context-manager classes for handling profiler setup and teardown - `kernprof.py::_pre_profile()` Refactored into the above context managers and other private functions (`_prepare_profiler()`, `_prepare_exec_script()`)

line_profiler/_child_process_profiling/cache.py::LineProfilingCache New class for passing info onto child processes so that profiling can resume there line_profiler/pth_hook.py New submodule for the .pth-file-based solution to propagating profiling into child processes: write_pth_hook() In the main process, write the temporary .pth file to be loaded in child processes load_pth_hook() Called by the .pth in child process, loading the cache and setting up profiling based thereon

line_profiler/_child_process_profiling/cache.py::LineProfilingCache Added new `.profile_imports` attribute to correspond to `kernprof`'s `--prof-imports` flag line_profiler/_child_process_profiling/meta_path_finder.py New submodule defining the `RewritingFinder` class, a meta path finder which rewrites a single module on import line_profiler/_child_process_profiling/pth_hook.py write_pth_hook() Now also handling the `os.fork()` patching/wrapping _setup_in_child_process() Now creating a `RewritingFinder` to mirror what `~.autoprofile.autoprofile.run()` does in the main process .

line_profiler/_child_process_profiling/cache::LineProfilingCache Refactored `.load()` line_profiler/_child_process_profiling/multiprocessing_patches.py New submodule for applying patches to the `multiprocessing` package, so that profiling is automatically set up in child processes created by it

line_profiler/_child_process_profiling/cache.py::LineProfilingCache <general> Added debug logging to various methods gather_stats() New method for gathering profiling stats from child processes inject_env_vars() New method for injecting `.environ` into `os.environ` line_profiler/line_profiler.py::LineStats get_empty_instance() New convenience method for creating an empty instance from_files() Added new argument `on_defective` to allow for processing a group of files that cannot all be correctly read

line_profiler/rc/line_profiler.toml::[tool.line_profiler.kernprof] Added new key-value pair `prof-child-procs` for the default value of `kernprof --prof-child-procs` kernprof.py - New boolean flags `[--prof-child-procs[=...] | --no-prof-child-procs]` for controlling whether to set up profiling in child processes - Fixed bug in `_manage_profiler.__exit__()` where `CuratedProfilerContext.uninstall()` can be skipped if the preceding code raises an error

kernprof.py::_prepare_child_profiling_cache() - Now respecting ${LINE_PROFILER_KEEP_TEMPDIRS} - Now setting `LineProfilingCache.debug` line_profiler/_child_process_profiling/cache.py::LineProfilingCache - Added new attributes `.debug` and `._debug_log` - Now diverting debug messages to log files in `.cache_dir`

line_profiler/_child_process_profiling/cache.py::LineProfilingCache add_cleanup() Now deferring to a `._add_cleanup()` method which allows for cleanup-function prioritization _debug_output() Fixed type-checking line_profiler/_child_process_profiling/multiprocessing_patches.py ::apply() Added debug output before `_setup_in_child_process()` is called to help with tracing line_profiler/_child_process_profiling/pth_hook.py load_pth_hook() _wrap_os_fork() Added debug output before `_setup_in_child_process()` is called to help with tracing _setup_in_child_process() - `wrap_os_fork` now defaults to false - `prof.dump_stats()` now has increased priority over other callbacks (doesn't seem to help with the malformed prof files though...) - Child-process profiling output now written to a less randomized filename to facilitate debugging

line_profiler/_child_process_profiling/cache.py::LineProfilingCache profiler New attribute for the profiler instance copy(..., inherit_profiler=...) New argument for inheriting the `.profiler` load() Now keeping track of the loaded instance and returning it in subsequent calls line_profiler/_child_process_profiling/multiprocessing_patches.py ::apply(..., lp_cache=None) - If the `LineProfilingCache.load()`-ed instance is consistent with that loaded from `cache_path`, the former is used - Added more debugging output line_profiler/_child_process_profiling/pth_hook.py load_pth_hook() Added more debugging output _wrap_os_fork() Updated debugging output _setup_in_child_process() - Now returning a boolean (whether setup has been newly done) - Now setting `.profiler` of the cache instance - Added moew debugging output

kernprof.py::_manage_profiler.__enter__() Updated so that the created `LineProfilingCache` instance carries a `.rewrite_module` line_profiler/_child_process_profiling/cache.py::LineProfilingCache Added an optional `.rewrite_module` attribute line_profiler/_child_process_profiling/import_machinery.py ::RewritingFinder.find_spec() Now looking at `.lp_cache.rewrite_module` (where available) to check for specs to return

line_profiler/_child_process_profiling/ cache.py::LineProfilingCache _replace_loaded_instance() New convenience method for an instance in a fork to replace the instance to be `.load()`-ed _consistent_with_loaded_instance New attribute for checking whether the instance is consistent with what would have been `.load()`-ed multiprocessing_patches.py bootstrap(..., lp_cache=...) Can now be `None`, which defers the `.load()`-ing of the cache instance apply() - Streamlined logic for retrieving the loaded instance - Now using the above deferred loading whenever appropriate, so that cleanup and profiling is preserved in forked processes pth_hook.py::_wrap_os_fork() Now using `._replace_loaded_instance()`, so that future calls to `.load()` in the forked process retrieves the newly-created instance

kernprof.py::_prepare_child_profiling_cache() - Updated call to `[...].multiprocessing_patches.apply()` - Now always setting up the created instance as the one returned by further calls to `.load()` line_profiler/_child_process_profiling/multiprocessing_patches.py PickleHook - Refactored to contain no instance variables - Now always using `LineProfilingCache.load()` to retrieve the appropriate cache instance bootstrap() Removed argument `lp_cache` get_preparation_data() Removed arguemnt `cache_path` apply() - Removed argument `cache_path` - Argument `lp_cache` now required - Simplified implementation

line_profiler/_child_process_profiling/import_machinery.py Removed line_profiler/_child_process_profiling/pth_hook.py ::_setup_in_child_process() No longer set up the `RewritingFinder` because messing with the import system doesn't help with propagating autoprofiling rewrites to child processes...

kernprof.py _dump_filtered_stats() Fixed bug where if no tempfile remains, the `extra_line_stats` are not merged into the dumped stats _prepare_child_profiling_cache() Now setting the `.profiler` of the returned cache object

line_profiler/_child_process_profiling/multiprocessing_patches.py ::_apply_mp_patches() - Added debugging output for the patches - Now patching the copy of `runpy` imported by `multiprocessing.runpy` line_profiler/_child_process_profiling/pth_hook.py _wrap_os_fork() No longer creating a new `LineProfiler` instance (helps with handling forked processes) _setup_in_child_process(..., prof=...) New argument for avoiding instantiating a new profiler when not necessary (e.g. in a forked process) line_profiler/_child_process_profiling/runpy_patches.py New submodule for the aforementioned patching of `runpy`

tests/test_child_procs.py test_running_multiproc_literal_code() New test paralleling `test_running_multiproc_{script,module}` to test `kernprof -c ...` test_multiproc_script_sanity_check() - Refactored parameters for better `pytest` output - Added testing for running the code with `python -c ...` <Misc> - Added CLI argument `--local` to the profiled module to toggle between a locally-defined summing function and an imported one - Refactored how the test modules are injected - Added debugging output to `subprocess.run()` calls - Added provisional support for examining the profiling data

tests/test_child_procs.py::test_multiproc_script_sanity_check() Now parametrized to test passing the function defined in the test module itself to `multiprocessing`

tests/test_child_procs.py test_running_multiproc_{module,literal_code}() Integrated into `test_running_multiproc_script()` test_running_multiproc_script() Extended parametrization

tests/test_child_procs.py test_profiling_multiproc_script() Test parallel to `test_running_multiproc_script()`, checking whether we are correctly profiling the child processes <General> - Added more docs - Updated dummy parameter names _ext_module, _test_module - Refactored how the fixtures are set up - Module names now randomized and clash-proof via `uuid.uuid4()` _run_subproc() - Moved code outputting captured streams from `_run_test_module()` to here - Added timing code

tests/test_child_procs.py TEST_MODULE_BODY, [_]test_module() Added CLI flag to select `multiprocessing` start methods _Params New convenience class for test parametrization test_multiproc_script_sanity_check() - Streamlined parametrization (15 subtests -> 10) - Added subtests for various `multiprocessing` start methods test_multiproc_script_sanity_check() - Streamlined parametrization (24 subtests -> 21) - Added subtests for various `multiprocessing` start methods

tests/test_child_procs.py test_module(), ext_module() Updated so that we can toggle for the function sent to `multiprocessing` to raise an error with the `--force-failure` CLI flag _run_test_module() - Now raising a new `ResultMismatch` error class (instead of using base assertions) for: - If `test_module()` writes the wrong number to stdout - If `nhits` are provided and the profiling results differ therefrom - Added argument `fail` for using the aforementioned `--force-failure` flag test_multiproc_script_sanity_check() Now also chceking the cases where the test module is run with `--force-failure` test_profiling_multiproc_script() Now also chceking the cases where the test module is run with `--force-failure` (FIXME: profiling bugged when the function errors out, and doesn't fail with a consistent pattern)

line_profiler/_child_process_profiling/multiprocessing_patches.py ::_Poller.__enter__() If `on_timeout='warn'`, the log message is now written BEFORE issuing the warning (so as to avoid having the latter preventing the latter via raising the warning) tests/test_child_procs.py _Param.sorted() New method for sorting the parametrizations of instances test_apply_mp_patches() - Renamed parameter `intercept_logs` from `debug` - Removed several redundant subtests (no need to test `intercept_logs` with all `multiprocessing` start methods) test_multiproc_script_sanity_check() Removed several redundant subtests (tests with `preimports=True` overlapping with `test_apply_mp_patches`)

TTsangSC · 2026-05-02T02:59:50Z

Hi @Erotemic, I think this is ready for review if you have the time... sorry for the metric ton of code.

Erotemic

It will take me a while to fully go through this, but here is a start.

Erotemic · 2026-05-02T20:06:38Z

+        ...     print("We probably didn't count up to 100 but whatever")
+        We probably didn't count up to 100 but whatever
+
+        >>> with (  # doctest: +NORMALIZE_WHITESPACE


Note, because we use xdoctest, you should be able to put # doctest: +NORMALIZE_WHITESPACE as a standalone comment above the code for it to apply to every line after. Although I don't often use NORMALIZE_WHITESPACE, so I can't say I'm 100% confident in that.

Erotemic · 2026-05-02T20:18:24Z

+        dir=get_path('purelib'),
+    )
+    try:
+        pth_content = 'import {0}; {0}.load_pth_hook({1})'.format(


This has the potential to cause some import overhead. IIUC, we are going to effectively import the entire line_profiler.init. I'm not sure if that's going to be noticeable or not. One idea is to make a separate _line_profiler_hooks package that's just a minimal top level module for very fast import, but that may be overengineering. I'd need to think about it and probably test it.

One data point on my machine, though I think other platforms are most likely gonna behave the same: importing line_profiler._child_process_profiling.pth_hook causes the following to be imported:

All top-level line_profiler submodules except the following:

~.cleanup

~.curated_profiling

~._threading_patches

~.ipython_extension

None of the line_profiler.autoprofile components, and

None of the line_profiler._child_process_profiling components except ~~.pth_hook itself.

So, as one would intuit, most of the "core" stuff directly used by line_profiler.LineProfiler and @line_profiler.profile. Could be worthwhile to set it up as a separate namespace... and it should be trivial to configure, given that we already create two separate namespaces (line_profiler and kernprof).

The remaining question is more of a design one: should write_pth_hook() stay in said namespace for symmetry with load_pth_hook(), or should it become an instance method of LineProfilingCache's?

Erotemic · 2026-05-02T20:20:07Z

+        )
+        if not fnames:
+            return LineStats.get_empty_instance()
+        return LineStats.from_files(*fnames, on_defective='ignore')


Maybe on_defective='warn' makes more sense here?

Could be. The original motivation is that:

Since the ._setup_in_child_process() code is run in all child processes at startup incl. unused worker processes and the fork-server process, each such process occupies and secures a tempfile name (*.lprof) by touching it. Said tempfile is supposed to be written to when the corresponding Python process terminates.

But then some of these files could result in errors upon reading: processes killed via signals before cleanup is run result in empty tempfiles, and those killed mid-cleanup make for corrupted ones.

I opted to suppress the error/warning messages given that they originate directly from pickle and are thus not the most intuitive/helpful.

But maybe that's just indication that we should've vet the input files better before passing them onto pickle. And given that the rest of the PR has seen a lot of iteration after these lines were first written, we should probably get a lot fewer false alarms from the warnings, and where we do have something to warn it will be of actual problematic cases (e.g. unclean exits resulting in incomplete data).

Erotemic · 2026-05-02T20:24:05Z

+    """
+    # XXX: why can `coverage` get away with not doing all these
+    # lock-file hijinks and just patching `BaseProcess._bootstrap()`?
+    def get_poller_args(


Do these need to be nested functions? Would a helper class make more sense than relying on closures / be more testable?

Do we have a test for a worker that does some non trivial processing and then intentionally terminates so we can have an explicit comparison between the unpatched and patched terminate and make sure they both behave similar enough?

Yeah I should probably refactor wrap_terminate() for readability. We've barely used closure variables in the nested funcs anyway (just cache in process_has_returned()); still, that can be easily remedied, given that I can just pass cache to the function in the _Poller.poll_until() constructor.

We do test Process.terminate() as shown in the coverage report, albeit indirectly:

In normal ("happy-path") code execution Process.terminate() isn't even called – AFAIK it only happens when the parallel workload errored out, which for whatever reason may leave the child process stuck "in limbo", where e.g. atexit cleanup callbacks may or may not be executed, or be in various stages thereof.

Such children are .terminate()-d as a part of finalization of the parent Pool (see Pool.__init__() and .terminate()), which in particular does not wait for child-process cleanup, thus necessitating the patch.

In order to test this we have tests¹ where the parallel workload can be "set up for failure". Those are where the coverage on wrap_terminate() comes from. A quick check is how running the test suite with -k "test_child and success" only gives 52% coverage on multiprocessing_patches.py while -k "test_child and failure" gives 72%.

Given that the existing code do call vanilla_impl() in a finally clause, it is probably the closest we can get to the unpatched behavior in that we (1) do eventually send the SIGTERM, and (2) before that, give child processes (as far as possible – which isn't 100% of the time, hence the necessity of the timeout) the required grace to run cleanup and write profiling data. Anyhow, maybe we can look into tests where we explicitly create a Process object and later .terminate() it, if that would be more assuring.

Footnotes

Granted, we just sum over consecutive integers in those tests. They have the benefit that the workload size corresponds directly to the num-hits on the loop line, and hence we can keep track of whether the gathered profiling data are complete. If that's too trivial maybe we can do Fibonacci or something instead. ↩

Erotemic · 2026-05-02T20:24:32Z

+        self, obj: Any, attr: str, value: Any, *,
+        name: str | None = None,
+        cleanup: bool = True,
+        priority: float = 0,


unused argument.

Thanks, good catch!

Erotemic · 2026-05-02T20:26:28Z

+    )
+
+
+@pytest.mark.retry(_NUM_RETRIES,


Why is a retry necessary here? Is there something non deterministic going on? Any chance we are hiding a race condition? Similar question with other pytest.mark.retry marked tests.

The short answer

Yes, workers with failing workload ends up in limbo sometimes, as mentioned in the other comment. It should theoretically be possible to drop the retry on POSIX (and even the whole wrap_terminate() deal) because of the use of SIGTERM handlers, but the consensus seems to be that signal handling is busted on Windows, so we still need the timeout which is by nature flaky.

The long answer

Again, processes receiving parallel workloads which don't error out seem to always exit cleanly and never need to be .terminate()-d. Those which don't... don't. Which sets up the potential race condition: the failing child could be cleaning up, stuck in limbo (possibly waiting for communication from the parent?), or whatever; meanwhile the parent don't really care either way, because it already received all the results (i.e. pickled return-value objects or exceptions) from the children, and is ready to .terminate() whichever child process that remain alive.

In earlier iterations of the PR I just set each child process up to manage a lock file, and had the parent process wait without a timeout for said file before sending the SIGTERM (which is essentially the only thing that the vanilla Process.terminate() does). Which then caused the tests to sometimes hang indefinitely, meaning that in some failing child processes we never naturally got to the point where the finally: cache.cleanup() clause is executed. Unfortunately such failure also seems to be nondeterministic, at least at a cursory glance.

Hence the timeout. Which did alleviate the issue in that we do get all of the profiling data most of the time – because the parent process no longer willy-nilly kills the children, at least until the timeout expires. This handles cases where the child process is healthy enough to actually attempt cleanup, but would have been prevented from doing so by the parent's .terminate()-ing it.

But then of course there are still corner cases for when the children is FUBAR – and I think it can be argued that those cases are what the Process.terminate() method is for in the first place. Like, in a perfect would the child process would've just executed its atexit hooks and exited non-zero, right after raising the error and pushing it to the communication queue with the parent. But obviously that doesn't always happen. In those cases anything goes... hence the retries.

Notably, coverage also seems to be having trouble with consistently handling data collection when child processes are .terminate()-d, and just gives up while on Windows:

@pytest.mark.skipif(env.WINDOWS, reason="SIGTERM doesn't work the same on Windows") @pytest.mark.flaky(max_runs=3) # Sometimes a test fails due to inherent randomness. Try more times. class SigtermTest(CoverageTest):

What do we do?

I didn't like having to retry either – but here we are. But maybe we can at least refactor and separate the tests, so that we make sure that the mark is only applied to cases where it might be needed/justified (i.e. tests with failing parallel workloads and on Windows).

That or we just skip (as coverage does) or XFAIL on Windows.

line_profiler/line_profiler.py <General> Remapped some `typing` imports and annotations because we're now on 3.10+ LineStats.__eq__(), .__[i]add__(), from_stats_objects() Fixed misrepresentation of the data structure in docstrings (filenames and function names swapped in `.timings` keys) LineStats.from_files() Added `on_empty` argument to handle empty files (that we know `pickle.load()` will fail on) and special handling therefor, so that error/warning messages are more informative

line_profiler/_child_process_profiling/cache.py ::LineProfilingCache.gather_stats() Now emitting warnings for all anomalous cases (empty and defective files) line_profiler/cleanup.py::Cleanup.patch() Fixed bug where the `priority` argument is unused, instead of passed to `.add_cleanup_with_priority()` as advertised

line_profiler/_child_process_profiling/cache.py::LineProfilingCache gather_stats() - Updated call signature - Now excluding the profiling-stats file from the `multiprocessing` resource-tracker server process, which is often empty because of process longevity and would have resulted in a stray warning _get_profiling_outfiles() New helper method for finding specific profiling-stats files line_profiler/_child_process_profiling/pth_hook.py::write_pth_hook() The .pth file written is now cleaned up with increased priority so as to be less disruptive to other Python processes; most importantly, it is removed before the `multiprocessing` fork-server process is rebooted, so that the new server process will not be polluted by profiling stuff kernprof.py _manage_profiler.__enter__(), .__exit__() Reworked cleanup-task priorities: - Writing of debug logs now managed by the `CuratedProfilerContext` instead of the `LineProfilingCache` - `LineProfilingCache.cleanup()` now called before the gathering of child-process profiling results, to ensure that results from the `multiprocessing` fork-server process is properly handled _prepare_child_profiling_cache() - Updated call signature, taking also a `CuratedProfilerContext` - Said context (instead of the cache) now manages the lifetime of the tempdir

line_profiler/_child_process_profiling/cache.py::LineProfilingCache write_pth_hook() Refactored from `~~.pth_hook.write_pth_hook` ._setup_in_main_process() Now using `.write_pth_hook()` ._config_source Convenience property for accessing the `ConfigSource` object loaded from `.config` _line_profiler_hooks.py Moved from `line_profiler/_child_process_profiling/pth_hook.py` setup.py Now installing `_line_profiler_hooks`

_line_profiler_hooks.py write_pth_hook() Removed (superseded by `LineProfilingCache.write_pth_hook()`) load_pth_hook() Fixed broken relative imports from `line_profiler` tests/test_child_procs.py Fixed import location of `load_pth_hook()`

line_profiler/_child_process_profiling/cache.py ::LineProfilingCache.gather_stats() Now only excluding profiling-stats files from the "auto-PIDs" if they are empty, to catch the corner case where a properly-profiled child process happens to share a PID with a non-contemporary process that we exclude (currently only the `multiprocessing.resoure_tracker.ResourceTracker` process) tests/test_child_procs.py::test_apply_mp_patches() Added call to `LineProfilingCache.cleanup()` before `.gather_stats()` and the inspection of the cache-dir contents to better replicate the `kernprof` workflow; in particular this makes sure the fork-server process is properly terminated and its profiling-stats file is written to

line_profiler/_child_process_profiling/multiprocessing_patches.py _PollerArgs New helper class for getting `_Poller` args from `LineProfilingCache.config` _get_config() Instead of taking an argument from which a `ConfigSource` can be loaded, now taking the `ConfigSource` itself wrap_terminate() Refactored internals into separate functions wrap_bootstrap(), _apply_mp_patches() Updated calls to `_get_config()`

TTsangSC · 2026-05-05T21:32:57Z

I have a new idea for improving determinism in profiling multiprocessing stuff.

The main culprit for failure in profiling child processes is that:
- While by wrapping Process._bootstrap() we are "guaranteed" on Python level that cleanup will occur after the parallel workload exits, it is important to note that the processes created by Pool doesn't directly have the callables we sent to the pool (via e.g. .map(), .starmap(), or .apply()) as their workloads. Instead their workload is the multiprocessing.pool.worker() function, which fetches tasks (i.e. our callables and their arguments) from an inqueue and pushes the results (either the return values of the parallel workloads or the errors they raised) back into an outqueue.
- The Pool object in the main process communicates with the children via said queues, and sees it fit to (OS-level) terminate them as long as all the results have been collected – even if the workload (i.e. worker()) hasn't exited per-se.
To maintain Python-level control, we must write profiling results on the task level, before the result is handed back to the pool. This can be achieved via patching the methods where tasks are pushed onto the queues, slipping in calls to LineProfilingCache.load().profiler.dump_stats() – namely Pool._get_tasks() and Pool._guarded_task_generation().
Unfortunately everything that needs to be communicated between processes has to be pickle-able, meaning that we can't simply create wrapper functions around the original callable. Still, if we wrote a helper class which does the wrapping on its .__call__() method it should be workable.

However, this strat does have complications:

Generally we have one profiler instance (LineProfilingCache.load().profiler) per child process. Since each process can handle multiple tasks, we must be careful against double-counting profiling data. Maintaining a single per-process filename handle to which profiling data is written and overwritten may be a solution. That or we reset the profiler after every task, but we haven't merged feat: Add reset_stats method to LineProfiler for resetting accumulated profiling data #322 yet.
This works fine and all with Pool-based parallelism, but fails if one sidestepped the Pool and otherwise managed their own Processes. Maybe we can have a switch somewhere for choosing whether to patch Pool or Process (or both, but we'll have to make sure the patches don't clash and cause double-counting), like how coverage allows for configuring which patch(es) to apply.

tests/test_child_procs.py _Params.drop_params(), .split_on_params() New helper methods for constructing instances based on existing ones _check_warnings New context manager for helping with checking the absence and presence of warnings at exit time _run_test_module() (This function underlies these tests: - `test_multiproc_script_sanity_check()` - `test_running_multiproc_script` - `test_profiling_multiproc_script_{success,failure}()`) Added protection against stray .pth file created when running `kernprof` test_cache_setup_child() Added check that warnings for empty profiling files are emitted iff expected test_apply_mp_patches_{success,failure}() - Split parametrized test into separate tests, and only allow retries on the known flaky cases - Added check that no stray warnings are emitted test_profiling_multiproc_script_{success,failure}() test_profiling_bare_python_{success,failure}() Split parametrized tests into separate tests, and only allow retries on the known flaky cases

line_profiler/_child_process_profiling cache.py::LineProfilingCache._handle_signal() Added more output and handling for the outcome of the `.cleanup()` call multiprocessing_patches.py::wrap_bootstrap() Now calling cleanup with the `new_thread=True` argument to prevent deadlocks when SIGTERM arrives at an inopportune time -- fellow child threads seem to play well together with the lock, but not the main thread and a child thread line_profiler curated_profiling.py::ClassifiedPreimportTargets.write_preimport_module() line_profiler.py::LineStats.from_files() Fixed calls to `warnings.warn()` so that the warnings are attributed to more informative frames

line_profiler/line_profiler_utils.py::CallbackRepr Migrated from `line_profiler/cleanup.py::_CallbackRepr`

line_profiler/_child_process_profiling/ cache.py::LineProfilingCache _dump_stats() New pseudo-method set by `._setup_in_child_process()` for writing profiling output @_method_wrapper Refactored implementation to use `._format_call()` for formatting calls in debug-log messages _format_call() New convenience method for formatting a call runpy_patches.py::_run() Refactored implementation to use: - `cache._format_call()` for formatting calls in debug-log messages - `Cleanup.patch()` for monkeypatching `exec()`

tests/conftest.py _RetryFailure New exception class for when a retry condition fails evaluation _RetryHelper condition New attribute for text condition manage_call() Added handling for evaluating `.condition` before each retry using the test function's globals locals get_helper() Added handling for the `condition=...` parameter of the marker: - If `False`, returns `None` - If a string, returns an instance with `.condition` set pytest_configure() Updated help text for the `@pytest.mark.retry` marker tests/test_retry_tests.py _TestModule run() New params `additional_std{out,err}_lines` for additional patterns to match the output streams to check_verbose_summary(), check_concise_summary() New params `std{out,err}` for additional patterns to match the output streams to conditions_module New `_TestModule` fixture for a test module using `@pytest.mark.retry(condition=...)` test_retry_conditions() New test using `conditions_module` to check the behavior of `@pytest.mark.retry(condition=...)`

line_profiler/_child_process_profiling/multiprocessing_patches.py TaskWrapper New pickleable wrapper object around a callable to be sent as tasks to child processes, pking sure that rofiling output is written before the call result is pushed back to the parent (and thus leaving the child prone to termination) Patch New helper object for declaratively constructing patches for subcomponents, reworked from `_apply_patches_generic()`; the patches are: - `pool`: Patch `multiprocessing.pool.Pool`, so that each tasks write profiling output before pushing result back to the parent process - `process`: Patch `multiprocessing.process.BaseProcess`, so that each child process write profiling output before exiting - `logging`: Patch logging functions in `multiprocessing.util`, so that the internal logs of `multiprocessing` are teed to the session cache's debug logs wrap_get_tasks(), wrap_guarded_task_generation() New wrappers for the eponymous methods in `multiprocessing.pool.Pool` which wraps the input callables in a `TaskWrapper` apply() - Merged with `_apply_patches_generic()` which it was a thin wrapper for - Added param `patches` for explicitly supplying which patches to apply - Refactored internals to work with `Patch` objects line_profiler/rc/line_profiler.toml [tool.line_profiler.child_processes.multiprocessing] Removed key `intercept_logs` (superseded by `patches.logging`) [tool.line_profiler.child_processes.multiprocessing.patches] New subtable of boolean switches for whether to apply each of the aforementioned patches [toll.line_profiler.child_processes.multiprocessing.polling] Updated obsolete comments

line_profiler/_child_process_profiling/multiprocessing_patches.py Patch.apply(static=...) Changed default to true _PATCHES['pool'].targets['Pool']['_get_tasks'] Updated to work with the new default line_profiler/cleanup.py::Cleanup.patch() Added new parameter `static` to toggle whether to use `inspect.getattr_static()` to get the pre-patch value; now defaulting to true, while the old behavior is equivalent to false

tests/test_child_procs.py _preserve_obj_attributes _preserve_attributes[.fetch_current_values()] New param `static` for toggling whether to use `inspect.getattr_static()` _preserve_attributes.compare_with_current_values() - New param `static` for toggling whether to use `inspect.getattr_static()` - Refactored to give more useful output and to only raise the `AssertionError` after all comparisons have been made get_patched_attributes() New convenience function superseding the `patched_attributes` fixture test_cache_setup_main_process() Refactored to test the various combinations of applying `multiprocessing` patches test_apply_mp_patches_{success,failure}() - Refactored to test the various combinations of applying `multiprocessing` patches - Further narrowed the scope of the `@pytest.mark.retry` on the failing case: retries are only attempted when the fully-deterministic `pool` patch isn't used test_profiling_{multiproc_script,bare_python}_failure() No longer using the `@pytest.mark.retry` decorator because the 'pool' patch is used by default test_load_pth_hook() Updated to check (against) the correct patched attributes

line_profiler/_child_process_profiling/cache.py::LineProfilingCache _additional_data New free-for-all field allowing for unstructured data to be `.dump()`-ed and `.load()`-ed .dump(), .load(), ._from_path() Updated implementations to handle `._additional_data` _warn_possible_lack_of_stats() New method for arbitrary (child or main) process to communicate that some PIDs are expected to not have written to their profiling-stats file ("maybe-bad PIDs"); this fixes the stray empty-file warning sometimes emitted when the 'process' patch is not applied _get_pids_possibly_lacking_stats() New method for retrieving said PIDs line_profiler/_child_process_profiling/multiprocessing_patches.py Massively refactored. _PollerArgs.new() Refactored from `.from_config()` MPConfig New object for holding the submodule-specific config options extracted from `LineProfilingCache.load().config` _Patch New protocol for objects (e/g. `Patch`) capable of `.apply()`-ing themselves via a `LineProfilingCache` and providing a `.summary` of the patched entities Patch.add_method() New convenience method for adding a single (instance, class, or static) method to be patched wrap_repopulate_pool_static() New wrapper for `Pool._repopulate_pool_static()` which alerts the session cache to "maybe-bad PIDs"; this is needed because when wrapping only `Pool` and not `Process`, processes which exited without receiving a single task would leave their profiling-stats file empty (see `LineProfilingCache._warn_possible_lack_of_stats()` above) wrap_terminate(), wrap_bootstrap() Simplified implementations using `MPConfig` RebootForkserverPatch, RunpyPatch New helper `_Patch` classes consisting of functionalities previously embedded in `apply()` ResourceTrackerPatch New helper `_Patch` class patching `multiprocessing.resource_tracker.ResourceTracker` to set up the reporting of "maybe-bad PIDs"; this is needed because the resource-tracker server, while not directly handling the profiled parallel workload, is long-lived and may not have exited (and thus written profiling stats) at stats-gathering time; supersedes `cache.py::_DEFAULT_GATHER_STATS_EXCLUDES` apply() - Renamed parameter `lp_cache` -> `cache` - Simplified implementation because the components are all outsourced to `_Patch` objects tests/test_child_procs.py test_cache_dump_load() Added check for the round-tripping of `._additional_data` _test_apply_mp_patches() Fixed minor typing issue _GLOBAL_MINIMAL_PATCHES, DEFAULT_GLOBAL_PATCHES, etc. Updated definitions so that we check for the patching and restoration of the correct attributes

tests/conftest.py RetryMarker Refactored from `_RetryHelper`: - `pytest_pyfunc_call()` now issues a warning for malformed `@pytest.mark.retry(...)` instead of erroring out - New attribute `require` for controlling whether to require the test to pass on 'all' or 'any' retries; old and default behavior equivalent to `'any'` - New attribute `is_active` for representing `@pytest.mark.retry(...)` setups where retries shouldn't be attempted - Constructor method `from_test_func()` refactored from `get_helper()`, and now always return an instance instead of `None` - New constructor method `from_arguments()` with doctest RetryMarkerWarning New warning class for malformed `@pytest.mark.retry(...)` markers RetryConditionFailure Refactored from `_RetryFailure` tests/test_retry_tests.py <General> Simplified implementation of `_TestModule` fixtures test_bad_markers() New test that malformed `@pytest.mark.retry(...)` results in the appropriate warnings instead of test failure test_requirement() New test that `@pytest.mark.retry(require=...)` works as expected tests/test_child_procs.py::test_apply_mp_patches_failure() Now retrying with `require='all'` to test AGAINST flakiness

line_profiler/cleanup.py::Cleanup.cleanup() Now taking an optional `reason` argument which is reflected in log messages line_profiler/_child_process_profiling/cache.py ::LineProfilingCache._handle_signal() - Updated the log message written to (hopefully) make it easier to catch corner cases where the `.cleanup()` call somehow didn't happen - Added a `reason` to the `.cleanup()` call line_profiler/_child_process_profiling/multiprocessing_patches.py ::wrap_bootstrap() line_profiler/curated_profiling.py::CuratedProfilerContext.uninstall() Added `reason`s to the `.cleanup()` calls

line_profiler/_child_process_profiling/multiprocessing_patches.py TaskWrapper, wrap_get_tasks(), wrap_guarded_task_generation() Removed wrap_worker() New wrapper function for wrapping `multiprocessing.pool.worker()` and writing profiling output ASAP as it catches the sentinel value

line_profiler/_child_process_profiling/cache.py::LineProfilingCache _stats_dumper New private attribute superseding `._dump_stats` cleanup() No longer overriding the base-class method _setup_in_child_process() Now setting up the `._stats_dumper` attribute instead of `._dump_stats` _handle_signal() - No longer calling `.cleanup()` to avoid deadlocks - Added handling for when the signal handler was originally set from outside of Python line_profiler/_child_process_profiling/multiprocessing_patches.py wrap_runner(), wrap_bootstrap() No longer calling `LineProfilingCache.load().cleanup()` with the `new_thread` argument

line_profiler/_child_process_profiling/cache.py::LineProfilingCache _stats_dumper Now a convenience subtype of `Cleanup` which can be directly called _setup_in_child_process(), _handle_signal() Updated setup and handling for `._stats_dumper` line_profiler/_child_process_profiling/multiprocessing_patches.py TaskWrapper, wrap_get_tasks(), wrap_guarded_task_generation() Partially reverted deletion because we need these on Windows wrap_worker() Patch now only applied when on POSIX (Windows uses the above instead)

tests/test_child_procs.py::test_apply_mp_patches_{success,failure}() - Updated implementations - Loosened checks against empty-file warnings - Updated retrying rules for `test_apply_mp_patches_failure()`: - Only retrying when on Windows and `patch_pool` is false - Reverted back to `require='any'`

tests/test_retry_tests.py::test_retry_conditions() Fix failure when running in a narrow terminal; see `pytest-mark-retry` commit d53a515

TTsangSC mentioned this pull request Apr 14, 2026

Update CI - Drop 3.8 / 3.9 #428

Merged

TTsangSC changed the title ~~FEAT: extend profiling to child processes~~ [Draft] FEAT: extend profiling to child processes Apr 14, 2026

TTsangSC force-pushed the profile-child-processes branch from 2cd2ed4 to f9a37af Compare April 15, 2026 03:48

TTsangSC added 26 commits April 15, 2026 20:14

Removed redundant check in kernprof

f2c8565

Renamed submodule

4d6900a

Fixed stat-aggregation bug

a51411f

kernprof.py _dump_filtered_stats() Fixed bug where if no tempfile remains, the `extra_line_stats` are not merged into the dumped stats _prepare_child_profiling_cache() Now setting the `.profiler` of the returned cache object

Streamlined debug output in _child_process_profiling

ab8b6a0

Test local funcs in the profiled script

3288fdf

tests/test_child_procs.py::test_multiproc_script_sanity_check() Now parametrized to test passing the function defined in the test module itself to `multiprocessing`

Consolidated tests

7034479

tests/test_child_procs.py test_running_multiproc_{module,literal_code}() Integrated into `test_running_multiproc_script()` test_running_multiproc_script() Extended parametrization

More debug output, flexible multiprocessing patching

d12b26e

Fix test failing corner case

96b8975

TTsangSC changed the title ~~[Draft] FEAT: extend profiling to child processes~~ FEAT: extend profiling to child processes Apr 29, 2026

Erotemic reviewed May 2, 2026

View reviewed changes

TTsangSC added 7 commits May 4, 2026 09:11

Finish .pth code migration

3a7ce87

_line_profiler_hooks.py write_pth_hook() Removed (superseded by `LineProfilingCache.write_pth_hook()`) load_pth_hook() Fixed broken relative imports from `line_profiler` tests/test_child_procs.py Fixed import location of `load_pth_hook()`

TTsangSC added 17 commits May 6, 2026 05:56

Fix stats file detection in test_apply_mp_patches_*

a7075d6

Moved code for formatting calls

c255740

line_profiler/line_profiler_utils.py::CallbackRepr Migrated from `line_profiler/cleanup.py::_CallbackRepr`

Fix tests/test_retry_tests.py

b347523

tests/test_retry_tests.py::test_retry_conditions() Fix failure when running in a narrow terminal; see `pytest-mark-retry` commit d53a515

		)


		@pytest.mark.retry(_NUM_RETRIES,

Conversation

TTsangSC commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Highlights

Explanation

Code changes

New code (click to expand)

line_profiler/_threading_patches.py

line_profiler/cleanup.py

line_profiler/curated_profiling.py

line_profiler/_child_process_profiling/

tests/conftest.py

tests/test_retry_tests.py

tests/test_child_procs.py

Modified code (click to expand)

line_profiler/line_profiler.py::LineStats

line_profiler/rc/line_profiler.toml

kernprof.py

tests/test_child_procs.py

Caveats

TODO

Credits

Notes

Footnotes

Uh oh!

TTsangSC commented Apr 15, 2026

Uh oh!

TTsangSC commented May 2, 2026

Uh oh!

Erotemic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

The short answer

The long answer

What do we do?

Uh oh!

TTsangSC commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TTsangSC commented Apr 14, 2026 •

edited

Loading

`line_profiler/_threading_patches.py`

`line_profiler/cleanup.py`

`line_profiler/curated_profiling.py`

`line_profiler/_child_process_profiling/`

`tests/conftest.py`

`tests/test_retry_tests.py`

`tests/test_child_procs.py`

`line_profiler/line_profiler.py::LineStats`

`line_profiler/rc/line_profiler.toml`

`kernprof.py`

`tests/test_child_procs.py`